68 research outputs found
Semi-Supervised Deep Regression with Uncertainty Consistency and Variational Model Ensembling via Bayesian Neural Networks
Deep regression is an important problem with numerous applications. These
range from computer vision tasks such as age estimation from photographs, to
medical tasks such as ejection fraction estimation from echocardiograms for
disease tracking. Semi-supervised approaches for deep regression are notably
under-explored compared to classification and segmentation tasks, however.
Unlike classification tasks, which rely on thresholding functions for
generating class pseudo-labels, regression tasks use real number target
predictions directly as pseudo-labels, making them more sensitive to prediction
quality. In this work, we propose a novel approach to semi-supervised
regression, namely Uncertainty-Consistent Variational Model Ensembling (UCVME),
which improves training by generating high-quality pseudo-labels and
uncertainty estimates for heteroscedastic regression. Given that aleatoric
uncertainty is only dependent on input data by definition and should be equal
for the same inputs, we present a novel uncertainty consistency loss for
co-trained models. Our consistency loss significantly improves uncertainty
estimates and allows higher quality pseudo-labels to be assigned greater
importance under heteroscedastic regression. Furthermore, we introduce a novel
variational model ensembling approach to reduce prediction noise and generate
more robust pseudo-labels. We analytically show our method generates higher
quality targets for unlabeled data and further improves training. Experiments
show that our method outperforms state-of-the-art alternatives on different
tasks and can be competitive with supervised methods that use full labels. Our
code is available at https://github.com/xmed-lab/UCVME.Comment: Accepted by AAAI2
Radiomics-Informed Deep Learning for Classification of Atrial Fibrillation Sub-Types from Left-Atrium CT Volumes
Atrial Fibrillation (AF) is characterized by rapid, irregular heartbeats, and
can lead to fatal complications such as heart failure. The disease is divided
into two sub-types based on severity, which can be automatically classified
through CT volumes for disease screening of severe cases. However, existing
classification approaches rely on generic radiomic features that may not be
optimal for the task, whilst deep learning methods tend to over-fit to the
high-dimensional volume inputs. In this work, we propose a novel
radiomics-informed deep-learning method, RIDL, that combines the advantages of
deep learning and radiomic approaches to improve AF sub-type classification.
Unlike existing hybrid techniques that mostly rely on na\"ive feature
concatenation, we observe that radiomic feature selection methods can serve as
an information prior, and propose supplementing low-level deep neural network
(DNN) features with locally computed radiomic features. This reduces DNN
over-fitting and allows local variations between radiomic features to be better
captured. Furthermore, we ensure complementary information is learned by deep
and radiomic features by designing a novel feature de-correlation loss.
Combined, our method addresses the limitations of deep learning and radiomic
approaches and outperforms state-of-the-art radiomic, deep learning, and hybrid
approaches, achieving 86.9% AUC for the AF sub-type classification task. Code
is available at https://github.com/xmed-lab/RIDL.Comment: Accepted by MICCAI2
MFR-Net: Multi-faceted Responsive Listening Head Generation via Denoising Diffusion Model
Face-to-face communication is a common scenario including roles of speakers
and listeners. Most existing research methods focus on producing speaker
videos, while the generation of listener heads remains largely overlooked.
Responsive listening head generation is an important task that aims to model
face-to-face communication scenarios by generating a listener head video given
a speaker video and a listener head image. An ideal generated responsive
listening video should respond to the speaker with attitude or viewpoint
expressing while maintaining diversity in interaction patterns and accuracy in
listener identity information. To achieve this goal, we propose the
\textbf{M}ulti-\textbf{F}aceted \textbf{R}esponsive Listening Head Generation
Network (MFR-Net). Specifically, MFR-Net employs the probabilistic denoising
diffusion model to predict diverse head pose and expression features. In order
to perform multi-faceted response to the speaker video, while maintaining
accurate listener identity preservation, we design the Feature Aggregation
Module to boost listener identity features and fuse them with other
speaker-related features. Finally, a renderer finetuned with identity
consistency loss produces the final listening head videos. Our extensive
experiments demonstrate that MFR-Net not only achieves multi-faceted responses
in diversity and speaker identity information but also in attitude and
viewpoint expression.Comment: Accepted by ACM MM 202
OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions
One-shot talking head generation has no explicit head movement reference,
thus it is difficult to generate talking heads with head motions. Some existing
works only edit the mouth area and generate still talking heads, leading to
unreal talking head performance. Other works construct one-to-one mapping
between audio signal and head motion sequences, introducing ambiguity
correspondences into the mapping since people can behave differently in head
motions when speaking the same content. This unreasonable mapping form fails to
model the diversity and produces either nearly static or even exaggerated head
motions, which are unnatural and strange. Therefore, the one-shot talking head
generation task is actually a one-to-many ill-posed problem and people present
diverse head motions when speaking. Based on the above observation, we propose
OSM-Net, a \textit{one-to-many} one-shot talking head generation network with
natural head motions. OSM-Net constructs a motion space that contains rich and
various clip-level head motion features. Each basis of the space represents a
feature of meaningful head motion in a clip rather than just a frame, thus
providing more coherent and natural motion changes in talking heads. The
driving audio is mapped into the motion space, around which various motion
features can be sampled within a reasonable range to achieve the one-to-many
mapping. Besides, the landmark constraint and time window feature input improve
the accurate expression feature extraction and video generation. Extensive
experiments show that OSM-Net generates more natural realistic head motions
under reasonable one-to-many mapping paradigm compared with other methods.Comment: Paper Under Revie
FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions
One-shot talking head generation has received growing attention in recent
years, with various creative and practical applications. An ideal natural and
vivid generated talking head video should contain natural head pose changes.
However, it is challenging to map head pose sequences from driving audio since
there exists a natural gap between audio-visual modalities. In this work, we
propose a Flow-guided One-shot model that achieves NaTural head motions(FONT)
over generated talking heads. Specifically, the head pose prediction module is
designed to generate head pose sequences from the source face and driving
audio. We add the random sampling operation and the structural similarity
constraint to model the diversity in the one-to-many mapping between
audio-visual modality, thus predicting natural head poses. Then we develop a
keypoint predictor that produces unsupervised keypoints from the source face,
driving audio and pose sequences to describe the facial structure information.
Finally, a flow-guided occlusion-aware generator is employed to produce
photo-realistic talking head videos from the estimated keypoints and source
face. Extensive experimental results prove that FONT generates talking heads
with natural head poses and synchronized mouth shapes, outperforming other
compared methods.Comment: Accepted by ICME202
OPT: One-shot Pose-Controllable Talking Head Generation
One-shot talking head generation produces lip-sync talking heads based on
arbitrary audio and one source face. To guarantee the naturalness and realness,
recent methods propose to achieve free pose control instead of simply editing
mouth areas. However, existing methods do not preserve accurate identity of
source face when generating head motions. To solve the identity mismatch
problem and achieve high-quality free pose control, we present One-shot
Pose-controllable Talking head generation network (OPT). Specifically, the
Audio Feature Disentanglement Module separates content features from audios,
eliminating the influence of speaker-specific information contained in
arbitrary driving audios. Later, the mouth expression feature is extracted from
the content feature and source face, during which the landmark loss is designed
to enhance the accuracy of facial structure and identity preserving quality.
Finally, to achieve free pose control, controllable head pose features from
reference videos are fed into the Video Generator along with the expression
feature and source face to generate new talking heads. Extensive quantitative
and qualitative experimental results verify that OPT generates high-quality
pose-controllable talking heads with no identity mismatch problem,
outperforming previous SOTA methods.Comment: Accepted by ICASSP202
On the validity of the local Fourier analysis
Local Fourier analysis (LFA) is a useful tool in predicting the convergence
factors of geometric multigrid methods (GMG). As is well known, on rectangular
domains with periodic boundary conditions this analysis gives the exact
convergence factors of such methods. In this work, using the Fourier method, we
extend these results by proving that such analysis yields the exact convergence
factors for a wider class of problems
COVID-19 vaccination willingness among people living with HIV in Shijiazhuang, China: a cross-sectional survey
ObjectivesThe COVID-19 pandemic imposed an enormous disease and economic burden worldwide. SARS-CoV-2 vaccination is essential to containing the pandemic. People living with HIV (PLWH) may be more vulnerable to severe COVID-19 outcomes; thus, understanding their vaccination willingness and influencing factors is helpful in developing targeted vaccination strategies.MethodsA cross-sectional study was conducted between 15 June and 30 August 2022 in Shijiazhuang, China. Variables included socio-demographic characteristics, health status characteristics, HIV-related characteristics, knowledge, and attitudes toward COVID-19 vaccination and COVID-19 vaccination status. Multivariable logistic regression was used to confirm factors associated with COVID-19 vaccination willingness among PLWH.ResultsA total of 1,428 PLWH were included, with a 90.48% willingness to receive the COVID-19 vaccination. PLWH were more unwilling to receive COVID-19 vaccination for those who were female or had a fair/poor health status, had an allergic history and comorbidities, were unconvinced and unsure about the effectiveness of vaccines, were unconvinced and unsure about the safety of vaccines, were convinced and unsure about whether COVID-19 vaccination would affect ART efficacy, or did not know at least a type of domestic COVID-19 vaccine. Approximately 93.00% of PLWH have received at least one dose of the COVID-19 vaccine among PLWH, and 213 PLWH (14.92%) reported at least one adverse reaction within 7 days.ConclusionIn conclusion, our study reported a relatively high willingness to receive the COVID-19 vaccination among PLWH in Shijiazhuang. However, a small number of PLWH still held hesitancy; thus, more tailored policies or guidelines from the government should be performed to enhance the COVID-19 vaccination rate among PLWH
Prompt-to-afterglow transition of optical emission in a long gamma-ray burst consistent with a fireball
Long gamma-ray bursts (GRBs), which signify the end-life collapsing of very
massive stars, are produced by extremely relativistic jets colliding into
circumstellar medium. Huge energy is released both in the first few seconds,
namely the internal dissipation phase that powers prompt emissions, and in the
subsequent self-similar jet-deceleration phase that produces afterglows
observed in broad-band electromagnetic spectrum. However, prompt optical
emissions of GRBs have been rarely detected, seriously limiting our
understanding of the transition between the two phases. Here we report
detection of prompt optical emissions from a gamma-ray burst (i.e. GRB 201223A)
using a dedicated telescope array with a high temporal resolution and a wide
time coverage. The early phase coincident with prompt {\gamma}-ray emissions
show a luminosity in great excess with respect to the extrapolation of
{\gamma}-rays, while the later luminosity bump is consistent with onset of the
afterglow. The clearly detected transition allows us to differentiate physical
processes contributing to early optical emissions and to diagnose the
composition of the jetComment: Authors' version of article published in Nature Astronomy, see their
website for official versio
- …